Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve worst-case performance of inline.text regex #1460

Merged
merged 1 commit into from
Apr 5, 2019

Conversation

andersk
Copy link
Contributor

@andersk andersk commented Apr 3, 2019

The old regex may take quadratic time to scan for potential email addresses starting at every point. Fix it to avoid scanning from points that would have been in the middle of a previous scan.

Marked version:

0.1.3 and later (problem introduced by commit 00f1f7a)

Markdown flavor: GitHub Flavored Markdown

Description

  • Fixes DoS issue reported privately.

Contributor

  • Test(s) exist to ensure functionality and minimize regression (if no tests added, list tests covering this PR); or,
  • no tests required for this PR.
  • If submitting new feature, it has been documented in the appropriate places.

Committer

In most cases, this should be a different person than the contributor.

  • Draft GitHub release notes have been updated.
  • CI is green (no forced merge required).
  • Merge PR

@UziTech
Copy link
Member

UziTech commented Apr 3, 2019

actual diff of gfm inline.text

- /^(`+|[^`])[\s\S]*?(?=[\\<!\[`*~]|\b_| {2,}\n|https?:\/\/|ftp:\/\/|www\.|[a-zA-Z0-9.!#$%&'*+\/=?^_`{\|}~-]+@|$)/
+ /^(`+|[^`])(?:[\s\S]*?(?:(?=[\\<!\[`*~]|\b_| {2,}\n|https?:\/\/|ftp:\/\/|www\.|$)|[^a-zA-Z0-9.!#$%&'*+\/=?_`{\|}~-](?=[a-zA-Z0-9.!#$%&'*+\/=?_`{\|}~-]+@))|(?=[a-zA-Z0-9.!#$%&'*+\/=?_`{\|}~-]+@))/

Copy link
Member

@UziTech UziTech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a redos test in /test/redos/ that will fail before this change and pass after?

@andersk andersk force-pushed the inline-text-quadratic branch from 175fae6 to 18dbc0b Compare April 3, 2019 23:44
@andersk
Copy link
Contributor Author

andersk commented Apr 3, 2019

(Updated to address a separate quadratic slowdown in the same regex.)

@UziTech Do you want me to literally drop in a gigantic .md file consisting of aaaaaaaaaaaaa…, or should we find a way to test this more intelligently?

@UziTech
Copy link
Member

UziTech commented Apr 3, 2019

Tests that take longer than 1 second are marked as failed. maybe slim it down to a test taking 2 seconds before this fix.

@andersk andersk force-pushed the inline-text-quadratic branch from 18dbc0b to bd789b3 Compare April 4, 2019 00:00
@andersk
Copy link
Contributor Author

andersk commented Apr 4, 2019

@UziTech Done.

Copy link
Member

@UziTech UziTech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this 💯 🏅

lib/marked.js Outdated Show resolved Hide resolved
The old regex may take quadratic time to scan for potential line
breaks or email addresses starting at every point.  Fix it to avoid
scanning from points that would have been in the middle of a previous
scan.

Signed-off-by: Anders Kaseorg <[email protected]>
@andersk andersk force-pushed the inline-text-quadratic branch from 830413b to be27472 Compare April 4, 2019 18:14
@UziTech UziTech requested a review from davisjam April 4, 2019 20:23
@UziTech
Copy link
Member

UziTech commented Apr 4, 2019

@davisjam do you want to look at this and make sure no redos vectors are added?

@@ -546,7 +546,7 @@ var inline = {
code: /^(`+)([^`]|[^`][\s\S]*?[^`])\1(?!`)/,
br: /^( {2,}|\\)\n(?!\s*$)/,
del: noop,
text: /^(`+|[^`])[\s\S]*?(?=[\\<!\[`*]|\b_| {2,}\n|$)/
text: /^(`+|[^`])(?:[\s\S]*?(?:(?=[\\<!\[`*]|\b_|$)|[^ ](?= {2,}\n))|(?= {2,}\n))/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is safe

.replace(']|', '~]|')
.replace('|$', '|https?://|ftp://|www\\.|[a-zA-Z0-9.!#$%&\'*+/=?^_`{\\|}~-]+@|$')
.getRegex()
text: /^(`+|[^`])(?:[\s\S]*?(?:(?=[\\<!\[`*~]|\b_|https?:\/\/|ftp:\/\/|www\.|$)|[^ ](?= {2,}\n)|[^a-zA-Z0-9.!#$%&'*+\/=?_`{\|}~-](?=[a-zA-Z0-9.!#$%&'*+\/=?_`{\|}~-]+@))|(?= {2,}\n|[a-zA-Z0-9.!#$%&'*+\/=?_`{\|}~-]+@))/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is safe

Copy link
Contributor

@davisjam davisjam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@UziTech UziTech merged commit b1ddd3c into markedjs:master Apr 5, 2019
@UziTech
Copy link
Member

UziTech commented Apr 5, 2019

This will be released in v0.6.2 🎉 #1441

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants